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Abstract This paper proposes scalable and fast algorithms for solving the Robust 
PCA problem, namely recovering a low-rank matrix with an unknown fraction of its 
entries being arbitrarily corrupted. This problem arises in many applications, such as 
image processing, web data ranking, and bioinformatic data analysis. It was recently 
shown that under surprisingly broad conditions, the Robust PCA problem can be ex- 
actly solved via convex optimization that minimizes a combination of the nuclear norm 
and the ^^-norm . In this paper, we apply the method of augmented Lagrange multi- 
pliers (ALM) to solve this convex program. As the objective function is non-smooth, 
we show how to extend the classical analysis of ALM to such new objective functions 
and prove the optimality of the proposed algorithms and characterize their convergence 
rate. Empirically, the proposed new algorithms can be more than five times faster than 
the previous state-of-the-art algorithms for Robust PCA, such as the accelerated proxi- 
mal gradient (APG) algorithm. Moreover, the new algorithms achieve higher precision, 
yet being less storage/memory demanding. We also show that the ALM technique can 
be used to solve the (related but somewhat simpler) matrix completion problem and 
obtain rather promising results too. We further prove the necessary and sufficient con- 
dition for the inexact ALM to converge globally. Matlab code of all algorithms discussed 
are available at |http: //perception, csl . illinois . edu/matrix-rank/home .htmll 

Keywords Low-rank matrix recovery or completion • Robust principal component 
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I Introduction 

Principal Component Analysis (PC A), as a popular tool for high-dimensional data 
processing, analysis, compression, and visualization, has wide applications in scientific 
and engineering fields [TS]. It assumes that the given high-dimensional data lie near 
a much lower-dimensional linear subspace. To large extent, the goal of PCA is to 
efficiently and accurately estimate this low-dimensional subspace. 

Suppose that the given data are arranged as the columns of a large matrix D G 
j^mxn rpj_^g mathematical model for estimating the low-dimensional subspace is to find 
a low rank matrix A, such that the discrepancy between A and D is minimized, leading 
to the following constrained optimization: 

min ll^lli^, subject to rank(yl) < r, D = A + E, (1) 
A.E 

where r <^ min(m, n) is the target dimension of the subspace and \\-\\f is the Frobenius 
norm, which corresponds to assuming that the data are corrupted by i.i.d. Gaussian 
noise. This problem can be conveniently solved by first computing the Singular Value 
Decomposition (SVD) of D and then projecting the columns of D onto the subspace 
spanned by the r principal left singular vectors of D |15j . 

As PCA gives the optimal estimate when the corruption is caused by additive 
i.i.d. Gaussian noise, it works well in practice as long as the magnitude of noise is 
small. However, it breaks down under large corruption, even if that corruption affects 
only very few of the observations. In fact, even if only one entry of A is arbitrarily 
corrupted, the estimated A obtained by classical PCA can be arbitrarily far from the 
true A. Therefore, it is necessary to investigate whether a low-rank matrix A can still 
be efficiently and accurately recovered from a corrupted data matrix D — A + E, where 
some entries of the additive errors E may be arbitrarily large. 

Recently, Wright et al. [26] have shown that under rather broad conditions the 
answer is affirmative: as long as the error matrix E is sufficiently sparse (relative to 
the rank of A), one can exactly recover the low-rank matrix A from D = A + E hy 
solving the following convex optimization problem: 

min + AllBlli, subject to D = A + E, (2) 

A,E 

where || ■ ||* denotes the nuclear norm of a matrix (i.e., the sum of its singular values), || ■ 

I I denotes the sum of the absolute values of matrix entries, and A is a positive weighting 
parameter. Due to the ability to exactly recover underlying low-rank structure in the 
data, even in the presence of large errors or outliers, this optimization is referred to as 
Robust PCA (RPCA) in [21] (a popular term that has been used by a long line of work 
that aim to render PCA robust to outliers and gross corruption). Several applications 
of RPCA, e.g. background modeling and removing shadows and specularities from face 
images, have been demonstrated in [23 to show the advantage of RPCA. 

The optimization ([2} can be treated as a general convex optimization problem 
and solved by any ofi^-the-shelf interior point solver (e.g., CVX [12)). after being re- 
formulated as a semidefinite program [lU) . However, although interior point methods 
normally take very few iterations to converge, they have difficulty in handling large 
matrices because the complexity of computing the step direction is 0{mP), where m is 
the dimension of the matrix. As a result, on a typical personal computer (PC) generic 
interior point solvers cannot handle matrices with dimensions larger than m = 10^. 



3 



In contrast, applications in image and video processing often involve matrices of di- 
mension m — 10^ to 10^; and applications in web search and bioinformatics can easily 
involve matrices of dimension m = 10® and beyond. So the generic interior point solvers 
are too limited for Robust PCA to be practical for many real applications. 

That the interior point solvers do not scale well for large matrices is because they 
rely on second-order information of the objective function. To overcome the scalability 
issue, we should use the first-order information only and fully harness the special prop- 
erties of this class of convex optimization problems. For example, it has been recently 
shown that the (first-order) iterative thresholding (IT) algorithms can be very efficient 
for ^^-norm minimization problems arising in compressed sensing '28',T,'29','H]. It has 
also been shown in [Tj that the same techniques can be used to minimize the nuclear 
norm for the matrix completion (MC) problem, namely recovering a low-rank matrix 
from an incomplete but clean subset of its entries [21ll9] . 

As the matrix recovery (Robust PCA) problem involves minimizing a combina- 
tion of both the ^^-norm and the nuclear norm, in the original paper [26], the authors 
have also adopted the iterative thresholding technique to solve ((2|) and obtained simi- 
lar convergence and scalability properties. However, the iterative thresholding scheme 
proposed in [5^ converges extremely slowly. Typically, it requires about 10"* iterations 
to converge, with each iteration having the same cost as one SVD. As a result, even for 
matrices with dimensions as small as m = 800, the algorithm has to run 8 hours on a 
typical PC. To alleviate the slow convergence of the iterative thresholding method [26| . 
Lin et al. [18) have proposed two new algorithms for solving the problem which 
in some sense complementary to each other: The first one is an accelerated proximal 
gradient (APG) algorithm applied to the primal, which is a direct application of the 
FISTA framework introduced by [J, coupled with a fast continuation techniqufQ; The 
second one is a gradient-ascent algorithm applied to the dual of the problem From 
simulations with matrices of dimension up to m = 1, 000, both methods are at least 50 
times faster than the iterative thresholding method (see [T^ for more details). 

In this paper, we present novel algorithms for matrix recovery which utilize tech- 
niques of augmented Lagrange multipliers (ALM). The exact ALM (EALM) method 
to be proposed here is proven to have a pleasing Q-linear convergence speed, while the 
APG is in theory only sub-linear. A slight improvement over the exact ALM leads an 
inexact ALM (lALM) method, which converges practically as fast as the exact ALM, 
but the required number of partial SVDs is significantly less. Experimental results 
show that lALM is at least five times faster than APG, and its precision is also higher. 
In particular, the number of non-zeros in E computed by lALM is much more accu- 
rate (actually, often exact) than that by APG, which often leave many small non-zero 
terms in E. The necessary and sufficient condition for lALM to converge globally is 
also proven. 

In the rest of the paper, for completeness, we will first sketch the previous work in 
Section [21 Then we present our new ALM based algorithms and analyze their conver- 
gence properties in Section [3] (while leaving all technical proofs to Appendix A). We 
will also quickly illustrate how the same ALM method can be easily adapted to solve 
the (related but somewhat simpler) matrix completion (MG) problem. We will then 
discuss some implementation details of our algorithms in Section [4] Next in Section [5l 
we compare the new algorithms and other existing algorithms for both matrix recovery 



^ Similar techniques have been applied to the matrix completion problem by I23| . 
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and matrix completion, using extensive simulations on randomly generated matrices. 
Finally we give some concluding remarks in Section |6] 

2 Previous Algorithms for Matrix Recovery 

In this section, for completeness as well as purpose of comparison, we briefly introduce 
and summarize other existing algorithms for solving the matrix recovery problem ([2]). 

2.1 The Iterative Thresholding Approach 

The IT approach proposed in [26] solves a relaxed convex problem of ((2}: 

min p||* + A||£||i + -!-P|||. + ^||£|||., subject to A + E^D, (3) 
A,E It It 

where r is a large positive scalar so that the objective function is only perturbed 
slightly. By introducing a Lagrange multiplier Y to remove the equality constraint, one 
has the Lagrangian function of ([S]): 

L{A,E,Y) = + A + -1 + ^ \\Efp + - 4 - E). (4) 

It It t 

Then the IT approach updates A, E and Y iteratively. It updates A and E by minimiz- 
ing -L(^, E, y) with respect to A and E, with Y fixed. Then the amount of violation 
of the constraint A-\- E = D vs, used to update Y . 

For convenience, we introduce the following soft-thresholding (shrinkage) operator: 

{X — e,\i X > £, 
X + £,\i X < — £, (5) 
0, otherwise, 

where a; G K and e > 0. This operator can be extended to vectors and matrices by 
applying it element- wise. Then the IT approach works as described in Algorithm [T] 
where the thresholdings directly follow from the well-known analysis [71128) : 

USe\S\V^ ^&Yz^x^£\\X\\,.^\\\X-WfF. 5e[Wl = argmine||X||i + i||X-W|||, 

(6) 

where USV^ is the SVD of W. Although being extremely simple and provably correct, 
the IT algorithm requires a very large number of iterations to converge and it is difficult 
to choose the step size S/. for speedup, hence its applicability is limited. 

2.2 The Accelerated Proximal Gradient Approach 

A general theory of the accelerated proximal gradient approach can be found in [25ll4l 
120) . To solve the following unconstrained convex problem: 



min F{X) = g{X) + f{X), 



(7) 
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Algorithm 1 (RPCA via Iterative Thresholding) 

Input: Observation matrix D £ R™^", weights A and r. 
1: while not eonverged do 
2: = svd(yfc_i). 

3: Ak = US.r[S]V'^, 
4: E^=SxAYk-i], 
5: Yk = Yk-i+5k{D~Ak-Ek). 
6: end while 

Output: A <— Ai;, E <— Ef^. 



where "H is a real Hilbert space endowed with an inner product {•, •) and a corresponding 
norm || • ||, both g and / are convex and / is further Lipschitz continuous: ||V/(Xi) — 
V/(X2)|| < Lf\\Xi — X2II, one may approximate f{X) locally as a quadratic function 
and solve 

Xk+i = arg mmQiX,Yk) = f{Yk) + {VfiYk),X - Yk) + ^\\X ^Ykf + g(X), (8) 

which is assumed to be easy, to update the solution X. The convergence behavior of 
this iteration depends strongly on the points at which the approximations Q(X, Yj.) 
are formed. The natural choice Ij, = Xj, (proposed, e.g., by [TT]) can be interpreted 
as a gradient algorithm, and results in a convergence rate no worse than 0(A;~^) [4]. 
However, for smooth g Nesterov showed that instead setting Y/j = X]^ + ^^^^^ — ^(X-^ — 
X}^_{) for a sequence {t^} satisfying — ifc+i < ife can improve the convergence rate 
to 0(fc~^) [2D]. Recently, Beck and TebouUe extended this scheme to the nonsmooth g, 
again demonstrating a convergence rate of 0(fc~^), in a sense that F{X^.) — F{X*) < 
Ck-^ d]. 

The above accelerated proximal gradient approach can be directly applied to a 
relaxed version of the RPCA problem, by identifying 

X = {A,E), f{X) = -\\D-A-Efp, and ^(X) = + A||i5||i, 

where is a small positive scalar. A continuation technique [23], which varies /i, starting 
from a large initial value /iq and decreasing it geometrically with each iteration until 
it reaches the floor p,, can greatly speed up the convergence. The AFG approach for 
RPCA is described in Algorithm [2] (for details see [T8][27| ). 

2.3 The Dual Approach 

The dual approach proposed in our earlier work [18] tackles the problem ([2]) via its 
dual. That is, one first solves the dual problem 

nmx {D,Y}, subject to J(Y) < 1, (9) 

for the optimal Lagrange multiplier Y, where 

{A,B) =tiiA^B), J(F) =max(||y||2,A-i||y||oo) , (10) 

and II ■ \\oo is the maximum absolute value of the matrix entries. A steepest ascend 
algorithm constrained on the surface {Y\J{Y) = 1} can be adopted to solve ([9]), where 
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Algorithm 2 (RPCA via Accelerated Proximal Gradient) 

Input: Observation matrix D £ R™^", A. 
1: Ao = A_i = 0; £;o = E-x = 0; io = t-i = 1; A > 0; »? < 1. 
2: while not converged do 

4: = y/ - I + - D). 

5: {U, S, V) = svd(G^), Ak+i = US!^ IS]V'^. 

6: G^ = Y,^-^^{Y,^+Y,^-D). ' 
7: i?fc + i = 5^[Gf]. 

2 

l+,/4t| + l 

8: tfc+i = ^ ; iJ,k+i = max(7j/jfc,/i). 

9: k ^ k + 1. 
10: end while 
Output: A ■(- Afc, E Efc. 



the constrained steepest ascend direction is obtained by projecting D onto the tangent 
cone of the convex body {F|J(F) < 1}. It turns out that the optimal solution to 
the primal problem ^ can be obtained during the process of finding the constrained 
steepest ascend direction. For details of the final algorithm, one may refer to [18] . 

A merit of the dual approach is that only the principal singular space associated 
to the largest singular value 1 is needed. In theory, computing this special principal 
singular space should be easier than computing the principal singular space associated 
to the unknown leading singular values. So the dual approach is promising if an efficient 
method for computing the principal singular space associated to the known largest 
singular value can be obtained. 

3 The Methods of Augmented Lagrange Multipliers 

In [5] , the general method of augmented Lagrange multipliers is introduced for solving 
constrained optimization problems of the kind: 

min/(X), subject to h{X) ^ 0, (11) 

where / : R" — >■ R and h : R" R™. One may define the augmented Lagrangian 
function: 

L(X,y,/i) = f{X) + {Y,h{X)) + !^\\h{X)fF, (12) 

where is a positive scalar, and then the optimization problem can be solved via the 
method of augmented Lagrange multipliers, outlined as Algorithm [3] (see [6] for more 
details). 

Under some rather general conditions, when {fi^} is an increasing sequence and 
both / and h are continuously differentiable functions, it has been proven in [5] that the 
Lagrange multipliers Vj. produced by Algorithm [3] converge Q-linearly to the optimal 
solution when {nk} is bounded and super-Q-linearly when {/ife} is unbounded. This 
superior convergence property of ALM makes it very attractive. Another merit of ALM 
is that the optimal step size to update Fj, is proven to be the chosen penalty parameter 
fik , making the parameter tuning much easier than the iterative thresholding algorithm. 
A third merit of ALM is that the algorithm converges to the exact optimal solution, 
even without requiring jj.^. to approach infinity [5]. In contrast, strictly speaking both 
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Algorithm 3 (General Method of Augmented Lagrange Multiplier) 

1: p > 1. 

2: while not converged do 

3: Solve X^+i = avgmm L{X,Yk, fik)- 

4: Ffc+i = Yk+ fJ.kh{Xf,+i); 
5: Update fn- to fik+i- 
6: end while 
Output: Xf^. 



the iterative thresholding and APG approaches mentioned earlier only find approximate 
solutions for the problem. Finally, the analysis (of convergence) and the implementation 
of the ALM algorithms are relatively simple, as we will demonstrate on both the matrix 
recovery and matrix completion problems. 

3.1 Two ALM Algorithms for Robust PGA (Matrix Recovery) 

For the RPCA problem ((2|, we may apply the augmented Lagrange multiplier method 
by identifying; 

X^{A,E), f{X)^\\A\\^+X\\E\\i, and h{X) = D-A-E. 
Then the Lagrangian function is: 

L{A,E,Y,fi) = \\A\\^+X\\E\\i + {Y,D-A-E) + !^\\D-A-EfF, (13) 

and the ALM method for solving the RPCA problem can be described in Algorithm [H 
which we will refer to as the exact ALM (EALM) method, for reasons that will soon 
become clear. 

The initialization Yq = sgn(_D)/ J(sgn(D)) in the algorithm is inspired by the dual 
problem ^ as it is likely to make the objective function value {D,Yq) reasonably 
large. 

Although the objective function of the RPCA problem ((2| is non-smooth and hence 
the results in [5] do not directly apply here, we can still prove that Algorithm [4] has the 
same excellent convergence property. More precisely, we have established the following 
statement. 

Theorem 1 For Algorithm^ any accumulation point {A*,E*) of [A*i^,El.) is an op- 
timal solution to the RPCA problem and the convergence rate is at least 0(ii^^) in the 
sense that 

\\\Al\\.+X\\Elt-r\=0{^^\), 
where f* is the optimal value of the RPCA problem. 

Proof See Appendix lA.3l 

From Theorem[TJ we see that if fij^ grows geometrically, the EALM method will converge 
Q-linearly; and if /j.^. grows faster, the EALM method will also converge faster. However, 
numerical tests show that for larger /i^, the iterative thresholding approach to solve 
the sub-problem {A%^i, E^.^-^) = argmin 1,(^4, _E,y;!',^fc) will converge slower. As the 
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Algorithm 4 (RPCA via the Exact ALM Method) 

Input: Observation matrix D £ R"*^", A. 
1: = sgn(D)/J(sgn{D)); /io > 0; p > 1; fc = 0. 
2: while not converged do 

3: // Lines 4-12 solve (Al^^,El^^) = axgmmL{A,E,Y*,^lk)■ 

4: Al^^ = Al,El^^ = El,j = 0; 

5: while not converged do 

6: // Lines 7-8 solve A]^-^ = arg mm L(A, S^^^ , y^* , /x^). 

7: (U,S,V)=^^d{D-Ei^^ + ^,-^Y*)■, 

9: // Line 10 solves S^+j = arg nun L(yl^+^^ , B, Y*,fj,h). 

10: Eil\ = 5,^-1 [D - Ail\ + M-iy,1; 

11: + 1. 

12: end while 

13: y;+i = y; + f^k{D- a*^, - s^^,). 

14: Update fii^ to /ifc+i. 
15: k<^k + l. 
16: end while 
Output: iAl,E*). 



SVD accounts for the majority of the computational load, the choice of {/i^} should 
be judicious so that the total number of SVDs is minimal. 

Fortunately, as it turns out, we do not have to solve the sub-problem 

(^fe+l.-Efc+i) = argininL(A,£, Ffc 

exactly. Rather, updating A^. and E/^ once when solving this sub-problem is sufficient 
for A)^ and E]^ to converge to the optimal solution of the RPCA problem. This leads 
to an inexact ALM (lALM) method, described in Algorithm [S] 



Algorithm 5 (RPCA via the Inexact ALM Method) 

Input: Observation matrix D e R'"*^" , A. 
1: Yo = D/J(D); Eq = 0; fia > 0; p > 1; k = 0. 
2: while not converged do 

3: // Lines 4-5 solve Af^^i = arg rnin L(y4, Bj,, y^,, fi^)- 

4: {U,S,V) = svd(D-Ei,+fi-Wt:); 

5: Ak+i = US i[S]VT. 

6: // Line 7 solves E^+i = a,rgm\nL{Ai^^i, E,Yk, fit)- 

7: Ek+i=S^^-i[D-Ak+i+p.-^Yk]. 

8: Yk+i = Yk +p.k{D - Ak+i - Ek+i). 

9: Update fi^ to ^fe+i. 
10: fc^fc-l-l. 
11: end while 
Output: {Ak,Ek). 



The vahdity and optimality of Algorithm [S] is guaranteed by the following theorem. 
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Theorem 2 For Algorithm [31 if {fik\ nondecreasing and ^ = +oo then 

k=l 

{Ak,Ei^) converges to an optimal solution {A*,E*) to the RPCA problem. 
Proof See Appendix I A. 5 1 

We can further prove that the condition ^ = +oo is also necessary to ensure the 

k=l 

convergence, as stated in the following theorem. 

Theorem 3 If '}2 l^i; < +oo then the sequence produced by Algorithm[E[ 

k=l 

may not converge to the optimal solution of the RPCA problem. 
Proof See Appendix lA.6l 

Note that, unlike Theorem \T\ for the exact ALM method. Theorem [5] only guaran- 
tees convergence but does not specify the rate of convergence for the inexact ALM 

method. The condition f^i^ ~ +oo implies that ^j. cannot grow too fast. The 

fe=l 

choice of fif; will be discussed in Section [l] Actually I ALM is the alternating direction 
method (ADM) in the literature |16lll3j . However, the traditional ADM requires that 
the penalty parameter /ij. be upper bounded, while in this paper we do not impose 
such a constraint. The advantage of unbounded {nk} is that the feasibility condition 
^fc + £'fc ~ -D can be approached more quickly because D — Aj; — _E;; — (Yj, — Yi;_i)//ij,_i 
and {Yfc} is bounded. 

Remark 1 Theorems [5] and are also true for more general problems as follows: 

mm f{x)+g{y), s.t. x + y = b, (14) 

where f{x) and g{y) are both convex functions with bounded subgradients. 

Remark 2 When {^fc} is non-increasing and nonnegative, we can also prove, in a similar 

manner, that Algorithm [5] converges globally if and only if "^2 l-'-k ~ +oo, and this is 

k=l 

also true for general problems like (|14|) . Note that in the traditional theory of ADM, 
{fj-k} is usually assumed to be bounded away from zero. 

3.2 An ALM Algorithm for Matrix Completion 

The matrix completion (MC) problem can be viewed as a special case of the matrix 
recovery problem, where one has to recover the missing entries of a matrix, given limited 
number of known entries. Such a problem is ubiquitous, e.g., in machine learning [TJ 
[2l[3], control [19] and computer vision [24]. In many applications, it is reasonable to 
assume that the matrix to recover is of low rank. In a recent paper [9], Candes and 
Recht proved that most matrices A of rank r can be perfectly recovered by solving the 
following optimization problem: 

min||A||*, subject to Aij=Dij, V(i,j) € J7, (15) 
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provided that the number p of samples obeys p > hi n for some positive constant 

C, where H is the set of indices of samples. This bound has since been improved by the 
work of several others. The state-of-the-art algorithms to solve the MC problem p5[) 
include the APG approach [53] and the singular value thresholding (SVT) approach 
[7]. As the RPCA problem is closely connected to the MC problem, it is natural to 
believe that the ALM method can be similarly effective on the MC problem. 
We may formulate the MC problem as follows 

niin||yl||*, subject to A + E = D, ns-2{E) = 0, (16) 

where nf} '■ R™^" — >• r"'X" jg g, linear operator that keeps the entries in f2 unchanged 
and sets those outside O (i.e., in D) zeros. As E will compensate for the unknown entries 
of D, the unknown entries of D are simply set as zeros. Then the partial augmented 
Lagrangian function (Section 2.4 of [S]) of (jlGp is 

L{A, E, Y, fj,) = ll^ll* + {Y,D-A^E) + !^\\D^A^ Efp. (17) 

Then similarly we can have the exact and inexact ALM approaches for the MC problem, 
where for updating E the constraint hq^E) = should be enforced when minimizing 
L{A, E,Y, ji). The inexact ALM approach is described in Algorithm [B] 



Algorithm 6 (Matrix Completion via the Inexact ALM Method) 

Input: Observation samples Dij, S of matrix D S R™^". 

1: Vo = 0; So = 0; MO > 0; p > 1; fc = 0. 
2: while not converged do 

3: // Linos 4-5 solve Aj-^i = a,rgm\n L{A, Ei^,Yf;, fj.^)- 

4: {U,S,V) = svd{D-Ei,+fi-^Yt:); 
5: Afe+i = C/5 

6: // Lino 7 solves E^+i = arg min L{Ak+i,E, Yf^^ik). 

7: -Efc-i-i = 7rf5(D - Afe+i -I- MjT^Yfe). 

8: Yk+i = Yk+^ik{D~Ak+i~Ek+i). 

9: Update to ^ik+i- 
10: k^k + 1. 
11: end while 
Output: {Ak,Ek). 



Note that due to the choice of -B^, 7r^(Y'j.) = holds throughout the iteration, i.e., 
the values of at unknown entries are always zeros. Theorems [T] and [2] are also true 
for the matrix completion problem. As the proofs are similar, we hence omit them here. 
Theorem O is also true for the matrix completion problem, because it is easy to verify 
that Y/j = T^niXk)- As {Ffc} is bounded (cf. Lemma[T]), {Vfc} is also bounded. So the 
proof of Theorem [3] is still valid for the matrix completion problem. 

4 Implementation Details 

Predicting the Dimension of Principal Singular Space. It is apparent that computing 
the full SVD for the RPCA and MC problems is unnecessary: we only need those 
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singular values that are larger than a particular threshold and their corresponding 
singular vectors. So a software package, PRO PACK [17] , has been widely recommended 
in the community. To use PROPACK, one have to predict the dimension of the principal 
singular space whose singular values are larger than a given threshold. For Algorithm 
[5l the prediction is relatively easy as the rank of A/^ is observed to be monotonically 
increasing and become stable at the true rank. So the prediction rule is; 

sv . =1 ^^'^'^ ^ (18) 

1^ min(svpfc + round(0.05d), d), if svpj. = svfc, 

where d = min(m, n), svj. is the predicted dimension and svpj. is the number of singular 
values in the svj. singular values that are larger than M^T^, and svg = 10. Algorithm 
[4] also uses the above prediction strategy for the inner loop that solves (A^^j^, Ej.^^^). 
For the outer loop, the prediction rule is simply sv/^^i = min(svp;, + round(O.ld), d). 
As for Algorithm [S] the prediction is much more difficult as the ranks of Ai^ are often 
oscillating. It is also often that for small fc's the ranks of Ai^ are close to d and then 
gradually decrease to the true rank, making the partial SVD inefBcien10. To remedy this 
issue, we initialize both Y and A as zero matrices, and adopt the following truncation 
strategy which is similar to that in [23| : 

^ f svufc + 1, if svufe < svfc, 

'^+1 \min(svni, + 10,d), if svufe = svfc, 

where svq — 5 and 

fsvp, ifmaxgap,<2, 
I mm(svpj., maxidfe), it maxgap^, > 2, 

in which maxgapj, and maxid^ are the largest ratio between successive singular values 
(arranging the computed svj, singular values in a descending order) and the corre- 
sponding index, respectively. We utilize the gap information because we have observed 
that the singular values are separated into two groups quickly, with large gap between 
them, making the rank revealing fast and reliable. With the above prediction scheme, 
the rank of A^. becomes monotonically increasing and be stable at the true rank. 

Order of Updating A and E. Although in theory updating whichever of A and E first 
does not affect the convergence rate, numerical tests show that this does result in 
slightly different number of iterations to achieve the same accuracy. Considering the 
huge complexity of SVD for large dimensional matrices, such slight difference should 
also be considered. Via extensive numerical tests, we suggest updating E first in Al- 
gorithms [4] and [5] What is equally important, updating E first also makes the rank of 
Ai^ much more likely to be monotonically increasing, which is critical for the partial 
SVD to be effective, as having been elaborated in the previous paragraph. 

Memory Saving for Algorithm In the real implementation of Algorithm [6l sparse 
matrices are used to store D and Yj-, and as done in [23] A is represented as A = LR^ , 
where both L and R are matrices of size m x svp^.. _Ej. is not explicitly stored by 
observing 

Ek+1 = ^f2{E> - Ak+i + f^k^Yk) = TTf^iAk+i) - Ak+i. (21) 
In this way, only 7rf2(^fc) is required to compute Yj. and D — _Ej. + fi^^Yi^. So much 
memory can be saved due to the small percentage of samples. 

^ Numerical tests show that when wc want to compute more than 0.2d principal singular 
vectors/values, using PROPACK is often slower than computing the full SVD. 
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Stopping Criteria. For the RPCA problem, the KKT conditions are: 

D-A*-E*=Q, Y* £d\\A*\\^, Y* ed{\\XE*\\i). (22) 

The last two conditions hold if and only if 9||y4*||* n 9(||Ai5*||i) 7^ 0. So we may take 
the following conditions as the stopping criteria for Algorithms |4] and O 

\\D-Au-Eu\\F/\\D\\F<ei and dist(9||Afe||*, a(||AiJ;,||i))/||i3||;^ < ^2, (23) 

where dist(X, Y) = min{||a::— y||^|x- G X, y G F}. For AlgorithmUl the second condition 
is always guaranteed by the inner loop. So we only have to check the first condition. 
For Algorithm O unfortunately, it is expensive to compute dist(9||Afc||*, 9(||Ai5fc||i)) 
as the projection onto is costly. So we may estimate dist(9||Afc||*, 9(||A-Bfc||i)) 

by lln -VfellF =/ife-il|£^fc --Efe.illF since % G d\\Ak\\* and Y^ G d{\\XEk\\i). 

Similarly, for the MC problem we may take the following conditions as the stopping 
criteria for Algorithm [G] 

\\D-Ak-Ek\\F/\\D\\F<ei and Aist{d\\Auh,S)/\\D\\F < 62, (24) 

where S = {y = if (i,j) ^ i?}. Again, as it is expensive to compute dist(i9||74j. ||*, S) 
we may estimate dist(c>||^fe||*, S) by ||Yfc - Y^Wf = ^ik~l\\Ek - E^^^^WfE Note that 
by H21[l \\Ek — ii'fe-ill.F can be conveniently computed as 

^JWAk-Ak-iWl - hQ{Ak)--Kn{Ak-i)\\l. 

+00 _ 

Updating ^1;. Rather than specifying an apnori sequence {/x^;} that satisfies X] /^fe ~ 

fc=l 

+CO, we propose updating ^j. adaptively as follows: 

^ f PMfc, if Mfell^^fc+l - ■Efcll.F/lli^lli^ < £2, ^25) 
1 Mfci otherwise, ^ ' 

where p > 1. 

Note that the condition PfcH-Efc+i — i5fc|li?/||Z)|li? < £2 in the above updating rule 
will eventually be satisfied if /^j, does not change because by Lemma [2] we can easily 
prove that -E^+i — E^. — > 0. Although He et al. [TJ] also proposed an adaptive choice 
of /ifc, it is hard to choose good parameters therein. 

Choosing Parameters. For Algorithm [l] we set /ig = 0.5/|| sgn(D)||2 and p = 6. The 
stopping criterion for the inner loop is - A>^\\f/\\D\\f < 10"^ and - 

^fcll-P/ll^llF < 10^®. The stopping criterion for the outer iteration is \\D — Al. — 
^^fell-F/ll-DllF < 10"'^. For Algorithm [3 we set ^iq = 1.25/||D||2 and p = 1.6. And the 
parameters in the stopping criteria are ei — 10~'^ and £2 = 10~^. For Algorithm [SI 
we set fiQ = I/II-DII2 and p = 1.2172 + 1.8588ps, where ps = \n\/{mn) is the sampling 
density and the relation between p and ps is obtained by regression. And the parameters 
in the stopping criteria are ei = 10~^ and £2 = 10~^. 

As Pk-i\\^k~ I^k—iWp actually significantly overestimates dist(9||j4fe |j » , 5), in real compu- 
tation we may estimate dist(9|| Afe|| , , S) as min(/ij;_i , y/Pk—l)\\I^k^Ek~i\\F, and the condition 
in 1 I25I I should be changed accordingly. 
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5 Simulations 

In this section, using numerical simulations, for the RPCA problem we compare the 
proposed ALM algorithms with the APG algorithm proposed in [TS]; for the MC prob- 
lem, we compare the inexact ALM algorithm with the SVT algorithm [7] and the APG 
algorithm [23]. All the simulations are conducted and timed on the same workstation 
with an Intel Xeon E5540 2.53GHz CPU that has 4 cores and 24GB memorjj^, running 
Windows 7 and Matlab (version 7.7)0 

/. Comparison on the Robust PCA Problem. For the RPCA problem, we use randomly 
generated square matrices for our simulations. We denote the true solution by the 
ordered pair {A*,E*) e R^x™ x R™'^". We generate the rank-r matrix A* as a 
product LR^ , where L and R are independent m x r matrices whose elements are 
i.i.d. Gaussian random variables with zero mean and unit variancelf] We generate E* 
as a sparse matrix whose support is chosen uniformly at random, and whose non-zero 
entries are i.i.d. uniformly in the interval [—500, 500]. The matrix D = A* + E* is the 
input to the algorithm, and {A,E) denotes the output. We choose a fixed weighting 
parameter A = Tn~^^'^ for a given problem. 

We use the latest version of the code for Algorithm [21 provide by the authors of 
[18| . and also apply the prediction rule H18|l . with svq — 5, to it so that the partial 
SVD can be utilizec|3. With the partial SVD, APG is faster than the dual approach in 
Section [2.31 So we need not involve the dual approach for comparison. 

A brief comparison of the three algorithms is presented in Tables [T] and [21 We can 
see that both APG and lALM algorithms stop at relatively constant iteration numbers 
and lALM is at least five times faster than APG. Moreover, the accuracies of EALM 
and lALM are higher than that of APG. In particular, APG often over estimates \\E* \\q, 
the number of non-zeros in E* , quite a bit. While the estimated ||-E*||o by EALM and 
lALM are always extremely close to the ground truth. 

//. Comparison on the Matrix Completion Problem. For the MC problem, the true 
low-rank matrix A* is first generated as that for the RPCA problem. Then we sample 
p elements uniformly from A* to form the known samples in D. A useful quantity for 
reference is dr = r(2m — r), which is the number of degrees of freedom in an m x m 
matrix of rank r |23) . 

The SVT and APGL (APG with line searcljfl) codes are provided by the authors 
of [7] and [23], respectively. A brief comparison of the three algorithms is presented in 
Table [31 One can see that lALM is always faster than SVT. It is also advantageous 
over APGL when the sampling density p/rn^ is relatively high, e.g., p/rn^ > 10%. This 
phenomenon is actually consistent with the results on the RPCA problem, where most 
samples of D are assumed accurate, although the positions of accurate samples are not 
known apriori. 

* But on a Win32 system only 3GB can be used by each thread. 

^ Matlab code for all the algorithms compared are available at 
[http: //perception. csl . Illinois . edu/matrix- rank/home .html 

^ It can be shown that A* is distributed according to the random orthogonal model of rank 
r, as defined in [9|. 

Such a prediction scheme was not proposed in 1181 . So the full SVD was used therein. 

* For the MC problem, APGL is faster than APG without line search. However, for the 
RPCA problem, APGL is not faster than APG [18]. 
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6 Conclusions 

In this paper, we have proposed two augmented Lagrange multiplier based algorithms, 
namely EALM and lALM, for solving the Robust PCA problem Both algorithms 
are faster than the previous state-of-the-art APG algorithm [18]. In particular, in all 
simulations lALM is consistently over five times faster than APG. We also prove the 
necessary and sufficient condition for lALM to converge globally. 

We have also applied the method of augmented Lagrange multiplier to the matrix 
completion problem. The corresponding lALM algorithm is considerably faster than the 
famous SVT algorithm [3. It is also faster than the state-of-the-art APGL algorithm 
[23| when the percentage of available entries is not too low, say > 10%. 

Compared to accelerated proximal gradient based methods, augmented Lagrange 
multiplier based algorithms are simpler to analyze and easier to implement. Moreover, 
they are also of much higher accuracy as the iterations are proven to converge to the 
exact solution of the problem, even if the penalty parameter does not approach infinity 
[5]. In contrast, APG methods normally find a close approximation to the solution by 
solving a relaxed problem. Finally, ALM algorithms require less storage/memory than 
APG for both the RPCA and MC problemqj. For large-scale applications, such as web 
data analysis, this could prove to be a big advantage for ALM type algorithms. 

To help the reader to compare and use all the algorithms, we have posted our 
Matlab code of all the algorithms at the website: 

jhttp: //perception. csl . Illinois . edu/matrix-rank/home .html| 
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A Proofs and Technical Details for Section [3] 

In this appendix, wc provide the mathematical details in Section [3] To prove Theorems [Tl and 
[2I we have to prepare some results in Sections lA.il and I A. 21 



A.l Relationship between Primal and Dual Norms 

Our convergence theorems require the boundedncss of some sequences, which results from the 
following theorem. 

Theorem 4 Let H be a real Hilbert space endowed with an inner product (■, ■) and a corre- 
sponding norm \\ ■ \\, and y S where df{x) is the subgradient of f{x). Then \\y\\* = I if 
X 0, and \\y\\* < 1 if x = 0, where \\ ■ ||* is the dual norm of || • ||. 



^ By smart reuse of intermediate matrices (and accordingly the codes become hard to read) , 
for the RPCA problem APG still needs one more intermediate (dense) matrix than lALM; 
for the MC problem, APG needs two more low rank matrices (for representing j4j;_i) and 
one more sparse matrix than lALM. Our numerical simulation testifies this too: for the MC 
problem, on our workstation lALM was able to handle A* with size 10" X lO"* and rank 10^, 
while APG could not. 
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Proof As y 6 wc have 

~ \\x\\ > {y,w - x) , \fwen. (26) 
If X ^ 0, choosing w = 0, 2x, wo can deduce that 

\\x\\ = {y,x)<\\x\\\\yr. (27) 
So > 1. On the other hand, we have 

||«; - x|| > ||«)|| - > (j/,to - x) , \/we'H. (28) 

So 

/ W ~ X \ 

\ II"' -^11/ 
Therefore ||j/||* < 1. Then we conclude that \\y\\* = 1. 
If X = 0, then II26II is equivalent to 

{y,'w) < 1, V \\w\\ = 1. (29) 

By the definition of dual norm, this means that ||;y||* < 1. 



A. 2 Boundedness of Some Sequences 

With Theorem |4l we can prove the following lemmas. 

Lemma 1 The sequences {Yj^}, {Vfc} and {Y^} are all bounded, where Y^. = (D— 
— -E-fc-l)- 

Proof By the optimality of A^^-^ and E^^^ wc have that: 



(31) 



e d\\Al^,\\, - Y* - f,,{D - A*._^, - El^^), 

G d (llAE-^Jli) - Y* - ^,,(D - Ai^^ - e;^^). 

So we have that 

6 9||AI,+i|U, y,Vi e 9(||A£,!+i||i) . (32) 

Then by Theorem |4] the sequences {Y^.*} is boundec^El by observing the fact that the dual 
norms of || ■ ||t and || ■ ||i are || ■ ||2 and || ■ ||oo [7lll8| . respectively. The boundedness of {Yjt} 
and {Yjj} can be proved similarly. 



A. 3 Proof of Theorem [U 
Proof By 



we have 



= m■mL{A,E,Y*,t^^:) 

^ A^^lo^^'^'^^^k^'^k) (33) 

= min (||A|U+A||E|ji) = r, 

A-]-E=D 

Pfc+ilU + ^ll^fe+illi 

= L{Al^„El^^,Y^',^,,) - ±- - mwi) (34^ 
< r-^(r;+iiil-iin*iil). 



A stronger result is that ||Y,*|j2 = A~^||Y,*||oo = 1 if A* 7^ and Et 7^ 
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By the boundcdncss of {Y^}, wc sec that 

\\Al^,\\, + X\\e;+,\\i <r+ 0(Mfe (35) 
By letting k +00, we have that 

+A||£;*||i < r. (36) 

As D — j4^_|_-^ — E^_^-^ = /J.^^^ (Yj^^-^ — Yj.*), by the boundedness of Y^* and letting k — > +00 we 
see that 

A* + = D. (37) 

Therefore, (A*,E*) is an optimal solution to the RPCA problem. 
On the other hand, by the triangular inequality of norms, 

ll^^+ilU + >\\D~ El^^W, + AllS-^JIi - IID - Al^^ - E'^JU 

> r-|!^-^fe+i-^fe+ill* (38) 

= r-/^fc'iin-+i-^feiu. 

So 

ll^fc+ilU + ^ll^^fc+illi > r - oi^l-'). (39) 

This together with 1 1351 1 proves the convergence rate. 



A. 4 Key Lemmas 

First, we have the following identity. 

Lemma 2 

ll-Bfc+i - E-pp + Atfe ^||Yfc+i - Y*|||. 
= llEfe - E-Wl+ii-^WYk - Y-Wl - liEfe+i - Efclll - Mfc 'im+i - YkWl (40) 
-2Mfc '((Yfc+i - Yfe,£;fe+i - Ek) + (Afc+i - A*,Yk+i - '^*> + {E^+i - E* ,Yk+i - Y*», 

where (A*,E*) and Y* are the optimal solutions to the RPCA problem JS]) and the dual 
problem {5|), respectively. 



Proof The identity can be routinely checked. Using A* + E* = D and D — Af^j^i — -Efe+i = 
At;:^(Ys,+i - Yfe), we have 

Mfe'(n+i-V'fe,>fe+i-V'*> 
= -(Afe+i-^*,Yfe+i-Y*>-(£;fc+i-£;*,Yfe+i-Y*> 

= (Afe+i - A*, Yfc+i - Yfc+i) ~ (Afe+i - A*, Yfc+i - Y*> - (Efc+i - E*, Y^+i - Y*> ^^'^ 
= Mfe(Afe+i - Efe+i - Ek) - (Afe+i - A*, Yfc+i - Y*) - (Sfe+i - £*, Yfc+i - Y*). 
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Then 

\\Ek+i-E*\\%+f,-''\\Yk+^-Y*\\l 
= (ll-E-fc — -£'*IIf ~ ll-'^fe+i ~ ^kW'p + 2(-Efc+i — -E*, -Efe+i — -Efc)) 

+Mfe 'din- - Y'Wl ~ \\Yk+i ~ YkWl + 2(l"fc+i - " Yk)) 

= \\Ek - E^Wl + 'llV-fe - ^-lll- - \\Ek+i - Ekfp - t^^''\\Yk+i - YkW 

+2{Ek+i - E*,Ek+i - Ek) + 2Mfe '(n+i - Y\ Ffe+i - Y^) 

= \\Ek - i?*iil + '11'^ - ^"IIf - ii^^fc+i - £^(=111 - /^fc "'ii^fc+i " n-iil 

+'2{Ek+i — E* , -Efc+i — -Bfe) + 2{Ak.^i — A*, _Efe+i — E^) 
-2/.-i((Afc+i - A*,Yk+i - Y") + (Efc+i - i?*,yfe+i - y*» 

= IIEfe - E*\\l + ^L-^\\Yk - Y'Wl - IIEfe+i - Efclll - A^fe '||ii.+i - n-lll 
+2(Afc_,.i + -Bfc+i — D, -Bfc+i — Ek) 

-2/.-i((Afc+i - A*,yfe+i - Y") + (Efc+i - E^Yk+i - y*» 

= IIEfe - E'Wl + M-2||yfe - Y*\\l - \\Ek+i - Ekfp - t^k''\\^k+i - YkWl 

-2,1-HYk+i - i?fe+i - i?fe> - 2ti-\{Ak+i - A*,Yk+i - Y") + {E^+i - E* ,Y^,+^ - Y*)). 

We then quote a classic result I22| . 

Lemma 3 The subgradient of a convex function is a monotone operator. Namely, if f is a 
convex function then 

{xi - X2,gi - 92) > 0, Vgi 6 df{xi),i = 1,2. 

Proof By the definition of subgradient, we have 

fi^l) - f{^2) > {^1 - ^2,92), f{x2) - f{xi) > {X2 - XI, gi). 

Adding the above two inequalities proves the lemma. 

Then we have the following result. 

Lemma 4 // fj.^ is nondecreasing then each entry of the folloviing series is nonnegative and 
its sum is finite: 

E /^fe {(n+i - Yk,Ek+i - Ek) + (Afc+i - A*,Yk+i - Y*) + {Ek+i - E\Yk+i - Y*^^^.^ 
< +00. 

Proof [A* , E* , Y*) is a saddle point of the Lagrangian function 

L(A,E,Y) = \\A\\,+X\\E\\i + {Y,D-A~E). (43) 
of the RFC A problem So we have 

Y* ed\\A*\\,, Y* &d{\\XE*\\i). (44) 
Then by Lemma|3] "ffe+i S 9||Afe+i|j*, and Yfe+i 6 d {\\XEk+i\\i) (cf. Section lAl2t . we have 

(Afe+i - A*,yfc+i -y) > 0, 

(Sfe+i -£;*,yfc+i -y*> > 0, (45) 
(Efe+i-Efe.yfe+i-yfe) > 0. 

The above together with Atfe+i > /^fe and lllOt . we have that {j|-Efc — E*\\^ + l^^^\\Yi^ — ^ Hl^} 
is non-increasing and 

2Mfe - Yk,Ek+i - Ek) + (Afc+i - A*, ife+i - y*> + (Efe+i - i?*, n+i - y*>) 

< (IIEfc - E*\\l + 'lln - Y'Wl) - (IIEfe+i - E'Wl + Mfc^illn+i - Y'Wl). ^^"^^ 
So II42II is proven. 
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A.5 Proof of Theorem [2] 



Proof When {/^fe} is upper bounded, the convergence of Algorithm [5] is already proved by He 
et al. |13l and Kontogiorgis et al. 1161 in more general settings. In the following, we assume 
that /ife — >■ +00. 

Similar to the proof of Lemma |4] we have 

+ 00 

^/i-2||yfc+i -yfelll < +00. 

k=l 

So 

\\D - Afc - E^:\\F = Mfe 'llYfc - n-illF -> 0. 

Then any accumulation point of (Ai^,Ek) is a feasible solution. 

On the other hand, denote the optimal objective value of the RPCA problem by /*. As 
Yfc e 9||Afe||, and G d{X\\E^\\i), we have 



\\Af,\\, + X\\Ek\\i 
< \\A*\\,+X\\E*\U ~ (yfc,A* - Afc> - (yfc,i?* - Efe) 

= /. + (y. -%,A*- Afc> + (y* - - Bfc) - (y*, A* - Afe + - Efc) 

= /* + - Yu.A* - Afe> + (y* - Yk,E* - Bfc) ~{y,D~ Afe - Efe). 



(47) 



From Lemma [4] 

fi-\{At, - A*, n - Y") + (Sfe -E%Yk- y*)) < +00. (48) 

k = l 

As X) A**; ~ there must exist a subsequence (A^. . , E^ ) such that 

(Afe^. - A', ifc^. - y*> + {Ek^ ~ E\ yfe^. - Y') ^ 0. 



Then we see that 



lira IIAfe lU +A||£;fe 111 </* 



So (Afc^. , iJfe^. ) approaches to an optimal solution (A* , E* ) to the RPCA problem. As /^^ — > +oo 
and {y^.} are bounded, we have that {\\E^,. - -E*|||, + ^I'^^WY^,. - y*|||.} -s> 0. 

On the other hand, in the proof of Lemma|4]we have shown that {\\E^. — E* \\'^p+ /^^^Hyfe — 
y*|||,} is non-increasing. So ||i?fe — E*\\% + ^7'^\\Y^. — y*|||- — > and we have that lim E^ = 

k — >- + cto 

£*. As lim D - Afe - Efe = and D = A* + E*, we see that lim Afe = A*. 

fc— > + oo fc— > + oo 



A. 6 Proof of Theorem [3] 

Proof In Lemma^ we have proved that both {Y^.} and {Yi^} are bounded sequences. So there 
exists a constant C such that 

ll>fel|F < C, and lln-llF < C. 

Then 

ll^^fc+i - ^^fcllF = IJ-k'-WYk+i - Yk+i\\F < 2CMfe 
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As ^ f-f.^ < +00, wc sec that {-Bfc} is a Cauchy sequence, hence it has a limit Eoa- Then 
fc=0 

00 

\\Eo^-E*\\f= Eo + J2 {Ek+i - Ek) ~ E* 

k = 

00 

> \\Eo - E'Wf - E \\Ek+i - Ek\\F (49) 

fc=0 

> |!£;o-£;*||f-2c e Mfe'- 

fe=0 

So if Algorithm [5] is badly initialized such that 

WEo-E-Wf > 2C^Mfe' 

then Ek will not converge to E* . 
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m 


algorithm 


M^^^ rank(i) ||E||o 


#SVD time (s) 





rank(A*) = 0.05 m., 




= 0.05 




500 


APG 


1.12e-5 


25 


12542 


127 


11.01 




EALM 


3.99e-7 


25 


12499 


28 


4.08 




lALM 


5.21e-7 


25 


12499 


20 


1.72 


800 


APG 


9.84e-6 


40 


32092 


126 


37.21 




EALM 


1.47e-7 


40 


32002 


29 


18.59 




lALM 


3.29e-7 


40 


31999 


21 


5.87 


1000 


APG 


8.79e-6 


50 


50082 


126 


57.62 




EALM 


7.85e-8 


50 


50000 


29 


33.28 




lALM 


2.67C-7 


50 


49999 


22 


10.13 


1500 


APG 


7.16C-6 


75 


112659 


126 


163.80 




EALM 


7.55e-8 


75 


112500 


29 


104.97 




lALM 


1.86e-7 


75 


112500 


22 


30.80 


2000 


APG 


6.27e-6 


100 


200243 


126 


353.63 




EALM 


4.61e-8 


100 


200000 


30 


243.64 




lALM 


9.54e-8 


100 


200000 


22 


68.69 


3000 


APG 


5.20e-6 


150 


450411 


126 


1106.22 




EALM 


4.39e-8 


150 


449998 


30 


764.66 




lALM 


1.49e-7 


150 


449993 


22 


212.34 





rank(yl*) = 0.05 m, 


\\E' 


= O.lOm^ 




500 


APG 


1.41e-5 


25 


25134 


129 


14.35 




EALM 


8.72e-7 


25 


25009 


34 


4.75 




lALM 


9.31e-7 


25 


25000 


21 


2.52 


800 


APG 


1.12e-5 


40 


64236 


129 


37.94 




EALM 


2.86e-7 


40 


64002 


34 


20.30 




lALM 


4.87e-7 


40 


64000 


24 


6.69 


1000 


APG 


9.97e-6 


50 


100343 


129 


65.41 




EALM 


6.07e-7 


50 


100002 


33 


30.63 




lALM 


3.78e-7 


50 


99996 


22 


10.77 


1500 


APG 


8.18e-6 


75 


225614 


129 


163.36 




EALM 


1.45e-7 


75 


224999 


33 


109.54 




lALM 


2.79e-7 


75 


224996 


23 


35.71 


2000 


APG 


7.11e-6 


100 


400988 


129 


353.30 




EALM 


1.23e-7 


100 


400001 


34 


254.77 




lALM 


3.31e-7 


100 


399993 


23 


70.33 


3000 


APG 


5.79e-6 


150 


901974 


129 


1110.76 




EALM 


1.05e-7 


150 


899999 


34 


817.69 




lALM 


2.27e-7 


150 


899980 


23 


217.39 



Table 1 Comparison between APG, EALM and lALM on the Robust PCA prob- 
lem. Wc present typical running times for randomly generated matrices. Corresponding to 
each triplet {m, rank(j4*), ||i?*||o}, the RPCA problem was solved for the same data matrix 
D using three difTcrent algorithms. For APG and lALM, the number of SVDs is equal to the 
number of iterations. 
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m 


algorithm 


M^^^ rank(i) ||E||o 


#SVD time (s) 





rank(A*) = 0.10 m, 


\\E' 


= 0.05 




500 


APG 


9.36e-6 


50 


13722 


129 


13.99 




EALM 


5.53e-7 


50 


12670 


41 


7.35 




lALM 


6.05e-7 


50 


12500 


22 


2.32 


800 


APG 


7.45e-6 


80 


34789 


129 


67.54 




EALM 


1.13e-7 


80 


32100 


40 


30.56 




lALM 


3.08e-7 


80 


32000 


22 


10.81 


1000 


APG 


6.64e-6 


100 


54128 


129 


129.40 




EALM 


4.20e-7 


100 


50207 


39 


50.31 




lALM 


2.61e-7 


100 


50000 


22 


20.71 


1500 


APG 


5.43e-6 


150 


121636 


129 


381.52 




EALM 


1.22e-7 


150 


112845 


41 


181.28 




lALM 


1.76e-7 


150 


112496 


24 


67.84 


2000 


APG 


4.77e-6 


200 


215874 


129 


888.93 




EALM 


1.15e-7 


200 


200512 


41 


423.83 




lALM 


2.49C-7 


200 


199998 


23 


150.35 


3000 


APG 


3.98e-6 


300 


484664 


129 


2923.90 




EALM 


7.92e-8 


300 


451112 


42 


1444.74 




lALM 


1.30e-7 


300 


450000 


23 


485.70 





rank(A*) = 0.10 m., 


\\E'\ 


= O.lOm^ 




500 


APG 


9.78e-6 


50 


27478 


133 


13.90 




EALM 


1.14e-6 


50 


26577 


52 


9.46 




lALM 


7.64e-7 


50 


25000 


25 


2.62 


800 


APG 


8.66e-6 


80 


70384 


132 


68.12 




EALM 


3.59e-7 


80 


66781 


51 


41.33 




lALM 


4.77e-7 


80 


64000 


25 


11.88 


1000 


APG 


7.75e-6 


100 


109632 


132 


130.37 




EALM 


3.40e-7 


100 


104298 


49 


77.26 




lALM 


3.73e-7 


100 


99999 


25 


22.95 


1500 


APG 


6.31e-6 


150 


246187 


132 


383.28 




EALM 


3.55e-7 


150 


231438 


49 


239.62 




lALM 


5.42e-7 


150 


224998 


24 


66.78 


2000 


APG 


5.49e-6 


200 


437099 


132 


884.86 




EALM 


2.81e-7 


200 


410384 


51 


570.72 




lALM 


4.27e-7 


200 


399999 


24 


154.27 


3000 


APG 


4.50C-6 


300 


980933 


132 


2915.40 




EALM 


2.02e-7 


300 


915877 


51 


1904.95 




lALM 


3.39e-7 


300 


899990 


24 


503.05 



Table 2 Comparison between APG, EALM and lALM on the Robust PCA prob- 
lem. Continued from Table[2]with different parameters of {m, rank(j4*), |ji?*||o}. 
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m 


r 


p/dr 


p/m? 


algorithm 
=\ 


#itcr 


rank(A) 


time (s) 


\\A-A*- \\f 

II 4 * II 
11^ II F 


1000 


10 


6 


0.12 


SVT 


208 


10 


18.23 


1.64e-6 












69 


10 


4.46 


3.16e-6 










lALM 


RQ 

uy 


1 n 

-LU 


O. 1 o 


1 40e-6 


1000 


50 


4 


0.39 


b V 1 


201 


50 


126. 18 


1.61c-6 










APGL 


76 


50 


24.54 


4.31e-6 










lALM 


oo 


ou 




-L. OOfc;-U 


1000 


100 


— 

3 


r\ r'-r — 

0.57 


b V 1 


228 


100 


319.93 


1.71e-6 










APGL 


81 


100 


70.59 


4.40e-6 










lALM 


41 


1 no 


42 94 


1 54e-6 


3000 


TTT- 

10 


— 

6 


0.04 


b V 1 


218 


10 


70.14 


1.77e-6 










APGL 


88 


10 


15.63 


2.33e-6 










lALM 


1 'ii 


1 n 


97 1 s 


1 41e-6 


3000 


50 


— - 

5 


— — — — 

0.165 


b V 1 


182 


50 


370.13 


1.58e-6 












78 


50 


101.04 


5.74e-6 










lALM 


O I 


ou 


89 f\R 


1 31e-6 


3000 


100 


4 


0.26 


SVT 


204 


100 


950.01 


1.68e-6 












82 


100 


248.16 


5.18e-6 










T A T l\yf 


ou 


±uu 


1 99 


1 (\ 

1 . oze-D 


5000 


10 


6 


0.024 


SVT 


231 


10 


141.88 


1.79e-6 












81 


10 


30.52 


5.26e-6 










lALM 




1 n 

J-U 




1 Q7o ft 


5000 


50 


5 


0.10 


SVT 


188 


50 


bo7.97 


1.62e-6 












88 


50 


ono no 
2Uo.Uo 


1.93e-6 










lALM 


7Q 


ou 


zou. / o 


1 .ouc-u 


5000 


100 


4 


0.158 


b V 1 


215 


100 


ZZO ( . IZ 


1.72c-6 










APGL 


98 


100 


dUd.82 


4.42e-6 










lALM 




1 on 


A^7 7Q 
^O i . liy 


1 c;q„ c 
1 .OOc-U 


8000 


TTT- 

10 


— 

6 


0.015 


b V 1 


230 


10 


ZOO. 94 


1.86e-6 










APGL 


87 


10 


66.45 


5.27e-6 










lALM 


235 


10 


186.73 


2.08e-6 


8000 


50 


5 


0.06 


SVT 


191 


50 


1095.10 


1.61e-6 










APGL 


100 


50 


509.78 


6.16C-6 










lALM 


104 


50 


559.22 


1.36C-6 


10000 


10 


6 


0.012 


SVT 


228 


10 


350.20 


1.80C-6 










APGL 


89 


10 


96.10 


5.13C-6 










lALM 


274 


10 


311.46 


1.96e-6 


10000 


50 


5 


0.05 


SVT 


192 


50 


1582.95 


1.62e-6 










APGL 


105 


50 


721.96 


3.82e-6 










lALM 


118 


50 


912.61 


1.32C-6 



Table 3 Comparison between SVT, APG and lALM on the matrix completion 
problem. Wc present typieal running times for randomly generated matrices. Corresponding 
to each triplet {m, rank(j4*), p/dr}, the MC problem was solved for the same data matrix D 
using the three different algorithms. 



